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DOCUMENT FILE RETRIEVAL DEVICE AND MACHINE READABLE RECORDING MEDIUM 
RECORDING PROGRAM 



PUB . NO . : 
PUBLISHED: 
INVENTOR (S) : 
APPLICANT (s) 
APPL. NO. : 
FILED: 
INTL CLASS: 



11-306205 [JP 11306205 A] 
November 05, 1999 (19991105) 
SHIMAZU HIDEO 
NEC CORP 

10-129485 [JP 98129485] 
April 23, 1998 (19980423) 
G06F-017/30 



ABSTRACT 

PROBLEM TO BE SOLVED: To realize a retrieval inquiry about a WWW home page 
by a natural language. 

SOLUTION: A WWW home page being a retrieval object document file is 
described in an XML . When a retrieval condition composition is inputted, 
a keyword extraction part 4 converts a natural language expression 
expressing an attribute name into an attribute name index including the 
attribute name and also converts the natural language expression expressing 
the attribute value into an attribute value index including a "pair of* 
%he said attribute name and attribute value f. A keyword filter part 
5 deletes the attribute name index existing at a place where the 

attribute name and the attribute value of the same attribute 

exist adjacent to each other in a converted index string. A document 
contents check part 6 checks whether or not a tag corresponding to pairs 

of attribute name and value of the all attribute value index 

exists in the retrieval object document file. If the said tag exists, a 
document contents output part 9 retrieves and outputs the attribute 

value of the tag having the relevant attribute name of the attribute 
name index that is included in the converted index string. 



COPYRIGHT: (C) 1999, JPO 
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DIALOG (R) File 350:Derwent WPIX 

(c) 2003 Thomson Derwent . All rts. reserv. 

015589150 **Image available** 
WPI Acc No: 2003-651305/200362 
XRPX Acc No: N03-518254 

XML document conversion method involves converting tag name of non-key 
component into attribute value corresponding to prescribed attribute 
name assigned to new component 
Patent Assignee: FUJITSU LTD (FUIT ) 
Inventor: ITANI N; YAHAGI H; YOSHIDA S 
Number of Countries: 002 Number of Patents: 002 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

JP 2003203067 A 20030718 JP 2001401934 A 20011228 200362 B 

US 20030158854 Al 20030821 US 2002274230 A 20021021 200362 

Priority Applications (No Type Date) : JP 2001401934 A 20011228 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
JP 2003203067 A 56 G06F-017/21 

US 20030158854 Al G06F-007/00 

Abstract (Basic): JP 2003203067 A 

NOVELTY - A tag name assigned to a non-key component of an XML 
document, is converted into an attribute value corresponding to 
prescribed attribute name which is assigned to a new component. The 



character string in the^ig name of the non-key component's defined as 
the content of the new component, and the key component of the XML 
document is retained. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is also included for 
data conversion method. 

USE - For converting character strings of XML document. 

ADVANTAGE - The XML documents are converted into compressed data 
values, thereby reducing the memory space for storing the XML 
documents. Also the processing line is reduced. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
the data structure conversion system. (Drawing includes non-English 
language text) . 

XML document conversion processor (10) 

extensible style sheet language transformation (XSLT) converter 

(11) 

XSLT structural transformation unit (12) 
application software (30) 
pp; 56 DwgNo 2/48 

Title Terms: DOCUMENT; CONVERT; METHOD; CONVERT; TAG; NAME; NON; KEY; 

COMPONENT; ATTRIBUTE; VALUE; CORRESPOND; PRESCRIBED; ATTRIBUTE; NAME; 

ASSIGN; NEW; COMPONENT 
Derwent Class: T01 

International Patent Class (Main) : G06F-007/00; G06F-017/21 
International Patent Class (Additional) : G06F-012/00 
File Segment: EPI 
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Matching source and target data by comparing source and target data nodes 

to determine percentage measure of similarity 
Patent Assignee: INFOGLIDE CORP (INFO-N) ; RIPLEY J R (RIPL-I); WHEELER D B 

(WHEE-I); WOTRING S C (WOTR-I) 
Inventor: RIPLEY J R; WHEELER D B; WOTRING S C 



Number of Countries: 
Patent Family: 
Patent No Kind Date 

WO 200213049 Al 20020214 
US 20020055932 Al 20020509 



097 Number of Patents: 004 
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WO 2001US24628 A 



Date 
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20000804 
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20010806 
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Week 
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Priority Applications (No Type Date): US 2000223449 P 20000804; US 

2001682207 A 20010806 
Patent Details: 

Patent No Kind Lan Pg Main IPC 
WO 200213049 Al E 64 G06F-017/00 

Designated States (National) : AE 
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Designated States (Regional) : AT 
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AU 200181111 A G06F-017/00 Based on patent WO 200213049 

EP 1317715 Al E G06F-017/00 Based on patent WO 200213049 

Designated States (Regional): AL AT BE CH CY DE DK ES FI FR GB GR IE IT 

LI LT LU LV MC MK NL PT RO SE SI TR 



Abstract (Basic): WO 200213049 Al 



NOVELTY - Method coWlsts in selecting comparison me^^ds such as 
exact string match, similarity string composition, synonym table lookup 
etc., comparing the source and target data structure nodes and 
determining a measure of similarity between them. Each node comprises 
an element name , data type attribute and an attribute description 
value . 

DETAILED DESCRIPTION - A strategy list assigns the comparison 
methods to each node name and value and data is automatically mapped to 
the target data node if the similarity measure exceeds a threshold. 
Each node is represented by HTML, XML or SGML and the comparison 
steps are repeated recursively. 

There is an INDEPENDENT CLAIM for a data matching computer program. 

USE - Method is for sharing data held in different databases with 
different formats and structures over the Internet 

ADVANTAGE - Method saves time and money by not requiring data 
sources to homogenize information before interchanging. 

DESCRIPTION OF DRAWING (S) - The figure shows an overview of 
heterogeneous database searching. 

pp; 64 DwgNo 1/11 

Title Terms: MATCH; SOURCE; TARGET; DATA; COMPARE; SOURCE; TARGET; DATA; 

NODE; DETERMINE; PERCENTAGE; MEASURE; SIMILAR 
Derwent Class: T01 

International Patent Class (Main) : G06F-007/00; G06F-017/00 
International Patent Class (Additional) : G06F-017/30 
File Segment: EPI 
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WPI Acc No: 2001-311950/200133 

XRPX Acc No: N01-223668 

Attribute extractor for structurized documents, extracts and outputs 
attribute value corresponding to indexed position, obtained by 
comparing input document content with prestored attribute schema 

Patent Assignee: FUJI XEROX CO LTD (XERF ) 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

JP 2001075974 A 20010323 JP 99246880 A 19990901 200133 B 

Priority Applications (No Type Date) : JP 99246880 A 19990901 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
JP 2001075974 A 25 G06F-017/30 

Abstract (Basic) : JP 2001075974 A 

NOVELTY - Contents of input document (la) are compared with 
prestored attribute schema (If ) . Attribute name and its index 
position corresponding to document content are extracted from attribute 
schema, respectively by extractors (lb,lc). Attribute names for 
position not indexed are deleted. Attribute data corresponding to 
indexed positions is extracted by extractor (Id) and outputs the data 
as a list (le) . 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following: 

(a) Attribute extracting method; 

(b) Recording medium with attribute extracting program 

USE - For detecting convergence of attributes in structurized 
documents specified in standard generalized markup language (SGML) , 
extensible markup language ( XML ) . 

ADVANTAGE - Required attribute is extracted simply without breaking 
the format of document and without being conscious of variations in 
document . 

DESCRIPTION OF DRAWING (S) - The figure shows the concept ional 



diagram of attribute exfl^ctor (The drawing includes non^Kglish 
language text) . 

Input document (la) 

Extractors (lb- Id) 

List (le) 

Attribute schema (If) 
pp; 25 DwgNo 1/29 

Title Terms: ATTRIBUTE; EXTRACT; DOCUMENT; EXTRACT; OUTPUT; ATTRIBUTE; 

VALUE; CORRESPOND; INDEX; POSITION; OBTAIN; COMPARE ; INPUT; DOCUMENT; 

CONTENT; ATTRIBUTE 
Derwent Class: T01 

International Patent Class (Main) : G06F-017/30 

International Patent Class (Additional): G06F-017/21; G06F-017/27 
File Segment : EPI 
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Attribute extractor of structurizing document, compares structurizing 
document with character row pattern, based on which attribute name 
and attribute value of structurizing document are extracted 

Patent Assignee: FUJI XEROX CO LTD (XERF ) 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

JP 2000259660 A 20000922 JP 9964504 A 19990311 200061 B 

Priority Applications (No Type Date) : JP 9964504 A 19990311 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
JP 2000259660 A 21 G06F-017/30 

Abstract (Basic) : JP 2000259660 A 

NOVELTY - Attribute name showing the attribute of a 
structurizing document (la) and character row pattern corresponding to 
the attribute name , are defined by a schema definition unit (lb) . 
The structurizing document is compared with the character row pattern, 
based on which attribute name and attribute value of the 
structurizing document are extracted. 

USE - For extracting and grouping row of desired attribute from 
structurizing document such as hypertext markup language document, 
extensible markup language document, standard generalized mark up 
language document. 

ADVANTAGE - Enables extracting required attribute name and 
attribute value . Enables identifying paragraph between documents. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
attribute extractor. 

Structurizing document (la) 

Schema definition unit (lb) 

pp; 21 DwgNo 1/39 

Title Terms: ATTRIBUTE; EXTRACT; DOCUMENT; COMPARE; DOCUMENT; CHARACTER; 

ROW; PATTERN; BASED; ATTRIBUTE; NAME; ATTRIBUTE; VALUE; DOCUMENT; EXTRACT 
Derwent Class: T01 

International Patent Class (Main) : G06F-017/30 

International Patent Class (Additional): G06F-017/21; G06F-017/27 
File Segment: EPI 
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WPI Acc No: 2000-044640/20C 

XRPX Acc No: N00-034220 

Text file searching system in internet - has keyword filter to 
selectively delete attribute name index when its repetition is 
detected 

Patent Assignee: NEC CORP (NIDE ) 

Number of Countries: 001 Number of Patents: 002 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

JP 11306205 A 19991105 JP 98129485 A 19980423 200004 B 

JP 3191762 B2 20010723 JP 98129485 A 19980423 200143 



Priority Applications (No Type Date) : JP 98129485 A 19980423 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
JP 11306205 A 21 G06F-017/30 

JP 3191762 B2 21 G06F-017/30 Previous Publ . patent JP 11306205 

Abstract (Basic) : JP 11306205 A 

NOVELTY - File search demand in natural language expression is 
investigated to acquire attribute name index and attribute value , 
using which a keyword for searching, is extracted. A filter (5) 
selectively deletes attribute name index when its repetition is 
detected. Attribute value and name index from filter are then 
used to search the required file. DETAILED DESCRIPTION - An INDEPENDENT 
CLAIM is also included for recording medium storing text file searching 
program. 

USE - For searching text file such as XML in internet by natural 
language expression search inquiry. 

ADVANTAGE - Redundancy of reply corresponding to search demand is 
eliminated by keyword filter. User desired file can be retrieved 
easily, by natural language expression demand. DESCRIPTION OF 
DRAWING (S) - The figure shows the block diagram of text file searching 
system. (5) Keyword filter. 

Dwg.1/7 

Title Terms: TEXT; FILE; SEARCH; SYSTEM; KEYWORD; FILTER; SELECT; DELETE; 

ATTRIBUTE; NAME; INDEX; REPEAT; DETECT 
Derwent Class: T01; W01 

International Patent Class (Main) : G06F-017/30 
File Segment: EPI 
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(c) 2003 European Patent Office. All rts. reserv. 
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Method and computer system for publishing information 
Verfahren und System zum Publizieren von Inf ormationen 
Methode et systeme pour la publication d' inf ormat ions 

PATENT ASSIGNEE: 

iUniverse. com, Inc., (3276360), 910 E. Hamilton Avenue, Suite 100, 
Campbell, CA 95008, (US), (Applicant designated States: all) 
INVENTOR: 

Tarn, Richard K. , 15498 Via Caballero, Monte Sereno, CA 95030, (US) 
Dunbar, Steve M., 900 Pepper Tree Lane, No. 1718, Santa Clara, CA 95051, 
(US) 

Nguyen, Young C, 3238 Via Del Mar, San Jose, CA 95124, (US) 
LEGAL REPRESENTATIVE: 

Kirschner, Klaus Dieter, Dipl.-Phys. (6506), Schneiders & Behrendt 

Rechtsanwalte - Patentanwalte Sollner Strasse 38, 81479 Munchen, (DE) 
PATENT (CC, No, Kind, Date) : EP 1139253 Al 011004 (Basic) 
APPLICATION (CC, No, Date) : EP 2001106127 010313; 
PRIORITY (CC, No, Date) : US 536192 000326 
DESIGNATED STATES: DE; FR; GB; IT 

EXTENDED DESIGNATED STATES: AL; LT; LV; MK; RO; SI 
INTERNATIONAL PATENT CLASS: G06F-017/60 

ABSTRACT EP 1139253 Al 

A method and a system take submissions of information offered for 
distribution or sale, combines partially or entirely at least two 
submissions to form a combination and distributing the combination in one 
or more forms. The system takes submissions from authors automatically 
over a network such as the Internet. Authors provide files for 
publication e.g., in XML files. Authors also provide the contractual 
terms for their publications. The system stores the submissions in two 
parts: content and descriptors that describe the content. On receipt of 
an order for distribution or sales, the system combines the contents and 
descriptors from at least two submissions to form a combination of the 
submissions. The system stores the contractual terms so that the authors 
are paid according to the distribution or sales of his or her 
publications. Customers provide their purchase orders for publications to 
the system. Customers can purchase publications in part or in whole as 
permitted by the contractual terms regarding the publications. Customers 
can also combine a publication with another publication or a personalized 
content submitted by the customers as permitted by the contractual terms 
regarding the publications. Customers can further select the output forms 
of their purchases, such as print media or electronic media, as permitted 
by the contractual terms regarding the publications. 

ABSTRACT WORD COUNT: 214 

NOTE: 

Figure number on first page: 1 

LEGAL STATUS (Type, Pub Date, Kind, Text) : 
Application: 011004 Al Published application with search report 

Examination: 020612 Al Date of request for examination: 20020403 

Change: 020821 Al Designated contracting states changed 20020628 

Withdrawal: 030723 Al Date of withdrawal of application: 20030526 

LANGUAGE ( Publication, Procedural , Application) : English; English; English 

FULLTEXT AVAILABILITY: 
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CLAIMS A (English) 200140 1113 

SPEC A (English) 200140 13362 
Total word count - document A 14475 
Total word count - document B 0 
Total word count - documents A + B 14475 



..SPECIFICATION is performed by software shown in the middle of p. 7 in 
file Parse. sqlj of Appendix D. In action 44, ingest engine 29 saves 
attribute names and attribute values of the XML tag node in 



iu (underscore) attribute le 167 of content management ^tot abase 32. 

Action 44 is followed by action 47. Action 44 is performed by software* 
shown in the middle of p. . . 
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A system for treating saved queries as searchable documents in a document 
management system 

System zum Behandeln von abgespeicherten Suchanf ragen als durchsuchbare 

Dokumente in einem Dokumentenmanagement system 
Systeme de traitement d 1 interrogations stockees comme documents qu'on peut 

chercher dans un systeme de gestion de documents 
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ABSTRACT EP 1104901 A2 

A file management appliance ( " FMA" ) is a device that utilizes multiple 
processes and queues to provide document capture and indexing services as 
part of a document management system. Through the document capture and 
indexing services of an FMA-based system, documents are archived into one 
or more data storage devices, thereby forming a document database. One 
mechanism by which users may search for and access such archived data 
from one or more document databases is by formulating and submitting one 
or more queries to the FMA system. Queries formulated within an FMA 
system are treated as documents within the FMA system and accordingly may 
be archived within the document database for later retrieval and 
execution . 

ABSTRACT WORD COUNT: 115 

NOTE: 

Figure number on first page: 2 

LEGAL STATUS (Type, Pub Date, Kind, Text) : 
Application: 010606 A2 Published application without search report 

Examination: 010606 A2 Date of request for examination: 20000823 

Search Report: 011107 A3 Separate publication of the search report 
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...SPECIFICATION file contains special information about the document such 
as, for example, bibliographic data extracted from the capturing device. 
In one embodiment, document metadata consists of pairs of attribute 
names and their values . 

Figure 4 is a table illustrating one embodiment of an FMA metadata 
file. In Figure 4, document metadata attributes are listed along with 
each attribute's meaning. 

Figure 5 is a table illustrating a second embodiment of an FMA 
metadata file. In Figure 5, document metadata attributes are listed along 
with their acceptable value types. 

Figure 6 illustrates one embodiment of an FMA metadata file in 
extensible markup language ( XML ) . The partial metadata code depicted 
in Figure 6 is illustrative of what might be produced for a document that 
was captured by user "jones" (line... 

...up to disk 37 (line 690). 

In the event that an FMA encounters a metadata file that is not 
well-formed (as defined by the XML specification available from the 
World Wide Web Consortium (W3C) at http://www.w3.org), then in one 
embodiment, that FMA replaces the metadata with a... 

. . .SPECIFICATION file contains special information about the document such 
as, for example, bibliographic data extracted from the capturing device. 
In one embodiment, document metadata consists of pairs of attribute 
names and their values . 

Figure 4 is a table illustrating one embodiment of an FMA metadata 
file. In Figure 4, document metadata attributes are listed along with 
each attribute's meaning. 

Figure 5 is a table illustrating a second embodiment of ...In Figure 
5, document metadata attributes are listed along with their acceptable 
value types . 

Figure 6 illustrates one embodiment of an FMA metadata file in 
extensible markup language ( XML ) . The partial metadata code depicted 
in Figure 6 is illustrative of what might be produced for a document that 
was captured by user "jones" (line... 

...up to disk 37 (line 690). 

In the event that an FMA encounters a metadata file that is not 
well-formed (as defined by the XML specification available from the 
World Wide Web Consortium (W3C) at http://www.w3.org), then in one 
embodiment, that FMA replaces the metadata with a... 
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English Abstract 

A comprehensive electronic business support system comprises three 
layers: (1) the business layer, including various smart components which 
unify data and business processes across all customer interactions; (2) 
the integration layer, including various communications messaging 
interfaces and enterprise application integration adpaters, which provide 
a flexible, automated, and process driven solution for integrating across 
business applications and operations support systems; and (3) the 
presentation layer, including various customer views, which are presented 
via particular business portals. A smart component server provides the 
core services and comprehensive business process logic required to 
successfully conduct business online. The communications messaging 
interfaces integrate with back-office systems for functions such as 
billing, provisioning, and interconnection. 

French Abstract 

Systeme de support global pour commerce electronique qui comporte trois 
couches, dont (1) la couche commerciale, comportant divers composants 
intelligents qui homogeneisent les processus de donnees et commerciaux 
pour toutes les interactions avec les clients, (2) la couche 
d 1 integration, y compris diverses interfaces de messagerie de 
communications et adaptateurs d 1 integration d 1 applications d 1 entreprise, 
qui fournissent une solution souple, automatisee et commandee par les 
processus pour 1 ' integration des applications inter-commerciales et des 
systemes de support d f operations, et (3) la couche de presentation, y 
compris diverses presentations a 1' intention des utilisateurs, qui sont 
presentees via des portails commerciaux particuliers . Un serveur a 
composants intelligents fournit les services cles et une logique globale 
de processus commerciaux, requis pour effectuer avec succes du commerce 
en ligne. Les interfaces de messagerie de communications s 1 integrent dans 
des systemes d 1 arriere-guichet pour des fonctions telles que la 
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Detailed Description 
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SECTION 2. DATA STRUCTURE 

This section describes the data structure for the eBusiness support 
system which includes various database tables. 

2. 1. AREA TABLE DESCRIPTIONS 

All tables of the database are grouped according to their area. Each 
table lists the name of each field attribute, whether or not the field 
is required, the type and length, and a description for each field. 

(1) Populating tables manually 

Every row in most database tables uses an Object Identifier (OID) as 
the 

primary key. To manually populate a row in a database, the entry for the 
row ' s 

OID must be determined for that area, using the SEQ table . The SEQ 
table 

contains an entry for each area of the application (the SEQ- NAME 
attribute ) 

and a VALUE . The value is the next number to be used as an OID for that 
area, and is the number to be used as the OID. . . 

...once this number is used, it must also be manually incremented, so that 
the next entry made will use a unique number. 

(2) Type 

Some tables have a TYPE field, a class indicator field used by the 
persistence layer to determine the subclass for a particular object. If 
the database is the correct value, the Topl-ink Builder console is used 
to view the inheritance properties for the superclass object. 

(3) Write Lock 

Most tables have a WRITE-LOCK field. This field works with the 
persistence layer to provide an optimistic lock that prevents access to a 
field if it is in the process of updating. 

(4) Primary Keys 

The field name OID denotes the primary key for a table . Any other field 
that uses "-OID" in its name is a foreign key. 



5 9 

(5) Dates. 

All unset dates are treated as Y. 



AGENT ... Name Allows Type ^fccription 

Nulls? 
6 9 

Table 2. 20 represents an invoice charge item that represents usage 
charges imported from an external billing system. 

Table 2 INVOICE-USAGE-CHARGE- ITEM 
Attribute Name Allows Type Description 
Nulls? 

OID N NUMBER (18) Object identifier for the invoice 
usage charge 
START DT N . . . 

...domain type for the "to" service 
identifier 

FROM-SVC- DOMAIN-CD N NUMBER (9) 1 Code indicating the service 
domain type for the "from" 
ervice identifier 

Table 2. 21. is a collection of attributes representing Adjustment 
Request to a Customer Bill. 

Table 2 BILL-ADJMNT-RQST 
Attribute Name Allows Type Description 
Nulls? 

OID IN INUMBER (18) JObject identifier for the bill adjustment 
70 

request 

BLNG -POINT OID. . . 
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English Abstract 

A system and method for creating a database of metadata (metabase) [14] 
of a variety of digital media content, including TV and radio content 
delivered on Internet. This semantic-based method captures and enhances 
domain or subject specific metadata of digital media content, including 
the specific meaning and intended use of original content, including the 
specific meaning and intended use of original content. To support 
semantics, a WorldModel [11] is provided that includes specific domain 
knowledge, ontologies as well as a set of rules relevant to the original 
content. The metabase [14] may also be dynamic in that it may track 
changes to the any variety of accessible content, including live and 
archival TV and radio programming. 

French Abstract 

L' invention concerne un systeme et un procede de creation d'une base de 
donnees de meta-donnees (meta-base) [14] d'une variete de contenus 
mediatiques numeriques, y compris le contenu televisuel et radiophonique 
delivre sur Internet. Ce procede fonde sur la semantique capture et 
ameliore des meta-donnees de sujets ou de domaines specif iques de 
contenus mediatiques numeriques, y compris la signification specifique et 
1 1 utilisation projetee de contenus originaux. Afin de supporter la 
semantique, un modele du monde [11] comprend des connaissances de 
domaines specif iques, des ontologies, ainsi qu'un ensemble de regies 
pertinent pour le contenu original. La meta-base [14] peut egalement etre 
dynamique en ce qu'elle peut pister des changements dans n ! import e quel 
type de contenu accessible, y compris la programmation televisuelle et 
radiophonique en direct et d 1 archives. 

Legal Status (Type, Date, Text) 

Publication 20010920 Al With international search report. 

Examination 20011213 Request for preliminary examination prior to end of 

19th month from priority date 

Fulltext Availability: 
Detailed Description 

Detailed Description 

... Web sites and retrieve digital media metadata from selected pages. 

1 5 An extractor proararn takes HTML pages and extraction rules as input 
and generates XML assets such as that shown in Fig. 6. These generated 
assets contain values for each attribute name belonging to the 
domain of that Web site. Once created, the assets are sent to a Metabase 
Agent that is in charge of enhancing and. . . 

...order to enhance the assets, the Metabase Agent uses information stored 
in the WorldModel as well as a Knowledgebase. The Knowledoebase is a 
collection of tables containing domain-specific information and 
relationships. After insertion into the metabase, the assets are then 
ready to be searched. 

The purpose of a WebCrawler is... 
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English Abstract 

Methods and systems consistent with the present invention provide a means 
for searching numerical data across networks such as the Internet, and 
removing the middle layer of query engines or servers used by 
conventional systems in retrieving data from relational databases over 
the Internet . The methods and systems in accordance with the present 
invention also provide a means for tying millions of computers together 
into a single database, thereby a query introduced to the system returns 
a table of data as a single database is capable of providing. 
Furthermore, the methods and systems consistent with the present 
invention provide the means for performing navigational, line item (or 
record-level), semantic, numerical, transformational, arithmetic, 
time-dependent, and cost based queries on numerical data. In addition, a 
user may also conduct select queries between unrelated databases. 

French Abstract 

La presente invention concerne des procedes et des systemes fournissant 
un moyen pour rechercher des donnees numeriques dans des reseaux tels 
qu ! Internet, et eliminer la couche centrale des moteurs de requete ou des 
serveurs utilises par les systemes classiques pour extraire des donnees a 
partir de bases de donnees relationnelles dans Internet. Ces procedes et 
ces systemes fournissent egalement un moyen pour relier des millions 
d ' ordinateurs ensemble dans une seule base de donnees, de maniere a ce 
qu'une requete introduite dans le systeme restitue une table de donnees 
comme une seule base de donnees. Par ailleurs, ces procedes et ces 
systemes fournissent un moyen pour realiser des requetes de navigation, 
d'article de ligne (ou de niveau d 1 enregistrement ) , semantiques, 
numeriques, transf ormationnelles , d' arithmetique, dependantes du temps, 
et des requetes basees sur le cout sur des donnees numeriques. De plus, 
1 1 utilisateur peut egalement gerer des requetes de selection entre les 
bases de donnees non reliees. 
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... be copied to a local source and the restriction on cataloging numbers 
is relaxed. 

At this step, the index of documents is created. A standard XML 
processor is used to proceed through each document, element by element, 
and then attribute by attribute to create an index of the major elements 
and attributes (step 1108) . 

In XML , "element" text values are the values appearing between opening 
and closing tags (e.g., "< data-source > US Census Bureau < data-source>" 
would have an element. . . 

...tag itself For example, "< lineitem> li-legend = Total 1 Revenues 1 > < 
line-item >" would have an attribute named "lijegend" 
which has a value of "Total Revenues." 

The XML processor collects both elements and attributes into name / 
value pairs and calls a software routine (or method) as each is 
completed. The "handler" method takes the URL, element /attribute name, 
and value and creates a new record in the index data table with those 
three values. The "cache" RDL documents are also handed off to an 
additional handler which collects all the attributes and elements into a 
relational table (step I I I 1) . During the search process, the data 
query processor 204 will use this relational cache to create the RDL 
elements in. . .of items that may be indexed, for example, element names 
(what is found within the "< > ... s) , element values (what is found 
between the 64< >995s)@ attribute names (what is found within the "< 
>"s and before an "="), attribute values (what is found between the 
"< > ... s and after "="), and various types of metadata outside of the 
documents (e.g., number of hits, response time of... 

...10 shows an index with four (4) columns, other implementations may have 
different numbers of columns . Discussed below are three different 
implementations of an index table , each with a different set of tag 
type inf ori-nation. Note that attribute names and legend names are 
mixed together in the same column: there is no overlap between attribute 
and element names in RDL. One skilled in the art with XML experience 
will recognize that this 



efficiency can be relaxed to accommodate languages without this feature 
by adding an additional 
column called "tag type." 

In one implementation, attribute names and values are recorded. 
This leads to the smallest index: easiest to create and fastest to 
search. To really test a query, however, requires going to the. . . 
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English Abstract 

A method for testing a web-based application comprising a plurality of 
forms such method comprising the steps of defining a form flow of ones of 
a plurality of forms, defining test parameters for a test; creating a 
test script file defined by a test parameters and a form flow, and, 
generating a plurality of sets of form requests in accordance with a test 
script, wherein ones of a plurality of sets of form requests are 
generated for each permutation of test parameters. 

French Abstract 

L 1 invention concerne un procede permettant de mettre a l 1 essai une 
application basee sur le Web qui englobe plusieurs formes. Ce procede 
consiste a definir un flux de forme parmi certaines formes d'une 
pluralite de formes, definir des parametres d'essai pour un essai, creer 
un fichier de scenario d 1 essai defini par des parametres d'essai et ledit 
flux de forme, et generer plusieurs series de demandes de formes en 
fonction du scenario d 1 essai, une desdites series de demandes de formes 
etant generee pour chaque permutation desdits parametres d'essai. 
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Claim 

... access 5 the data in a distributed manner across the Internet. 

External hosting requires remote execution of data. The implementation of 

Unicode as part of Extensible Markup Language ( XML ) and Extensible 
Stylesheet Language (XSQ standard is one of the reasons allowing 

applications to be made international. Unicode is a standard for 
interchanging, processing, and. . . 

...method for testing the application comprising the steps of (a) creating a 
test script file defined by test cases in a test script using an XML 
based markup language; and 

(b) providing a test script executive for running said test script file, 
hi accordance with a further aspect of the invention. . .XSL stylesheet 



interpreter with the appropriate HTML and the stylesheet t^Rreate the 

formatted markup of the appropriate MIME type. In some cases both the 
XML input and the XSL stylesheet may be provided to the client browser 
to interpret if the client has an XSL built. By way of background, 
Extensible Markup Language, abbreviated XML , describes a class of 
data objects called dt-xi-nl-doe XML documents and partially describes 
the behavior of computer programs which process them. XML is an 
application profile or 

restricted form of SGML, the Standard Generalized Markup Language. By 
construction, XML documents are conforming SGML documents. XML 
documents are made up of storage units called entities, which contain 
either 1 5 parsed or unparsed data. Parsed data is made up of characters 



.which form character data dt-chardata, and some of which form markup. 
Markup encodes a description of the document's storage layout and logical 
structure. XML provides a mechanism to impose constraints on the 
storage layout and logical structure. A software module called an XML 
processor is used to read XML documents and provide access to their 
content and structure. It is assumed that an XML processor is doing its 
work on behalf of another module, called the application, which resides 
on the web server 108. Each XML document has both a logical and a 
physical structure. Physically, the document is composed of the units 
called entities. An entity may refer to other... 

.all of which are indicated in the document by explicit markup. The 

logical and physical structures must nest properly. 

6 

SUBSTITUTE SHEET (RULE 26) 

Each XML Document contains one or more elements, the boundaries of 
which are delimited by start-tags and end-tags. Each element has a type, 
identified by name, sometimes called its "generic identifier" (GI), and 
may have a set of attribute specifications. Each attribute 
specification has a name and a value. Style Sheets can be associated 
with an XML document by using a processing instruction whose target is 

xml -stylesheet. This processing instruction follows the behavior of 
HTML 4 The xml -stylesheet processing instruction is parsed in the same 
way as a start-tag, with the exception that entities other than 
predefined entities I 0 must... 

.stylesheet to be applicable to a wide class of documents that have 
similar source tree structures. Each stylesheet describes rules for 
presenting a class of XML source documents. An XSL stylesheet processor 
accepts a document or data in XML and an XSL stylesheet and produces 
the presentation of that XML source content as intended by the 
stylesheet. There are two sub-processes to this presentation process 
First, a result tree is constructed from the XML source tree. Second, 
the result tree is interpreted to produce a formatted presentation on a 
display, on paper, in speech or onto other media. The. . . 

. obtain 
8 

SUBSTITUTE SHEET (RULE 26) 

data for the URL from the databases 218 and 216. The data server 206 
retrieves the appropriate data in XML , and forwards it to the run-time 
204. The run-time 204 adds runtime and directory information to the XML 
data. The data structure that is built or populated by the run-time 204 
from HML is termed RML, the structure of which will be... 

.shown at numeral 300 a defined schema or data structure for the elements 
contained in an HML application. All of the elements are described as 
XML -based schemas including attributes and elements. The first element 
of the HML 300 is an Application Element 301 which is a root element 
comprising the ... contained within an events element which contains 
multiple 

component elements 309; 

Multiple Directory elements 308 containing information to connect to 
directory type data. Contain a name attribute . Connection Element 303 



None. 

<batch> element 

Represents a list of scripts to be performed in batch. This element is 
optional and repeatable. 
Parent Elements 
Element Remarks 
<application> 
Attributes 

Attribute Description 

Name Item's distinguished name. Description Description of the batch. 
1 0 Text Content 
None . 

Child Elements 
Order lement Remarks 

1 <batchmember> Optional, Repeatable. 

Remarks 

The. . . 
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database (54) . A searcher (52) using SQL query engines performs queries 
to retrieve the documents and sends the results of the query in XML form 
to the users. 
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Detailed Description 

. . . simple SQL queries to retrieve the content desired by the user. 

In accordance with another aspect of the invention, a computer system for 
storing an XML document using a relational database is provided wherein 
the system comprises a converter that receives an XML document and 
generates relational database tables based on the structure of the XML 

document. The converter further comprises a software module that 
generates a unique name attribute for each node in the XML 
document, a software module that generates a path attribute for a 
particular node of the XML document wherein the path attribute 
comprises a list of the name attributes for the one or more nodes 
from the particular node to a root node of the XML document, a software 
module that generates an order attribute for the particular node, the 
order attribute comprising an enumerated order of the particular node 
from the root node to the particular node, and a software module that 
generates a NodeValue attribute containing a value of the particular 
node. Collectively these attributes are called encodings that result in 
efficient storage, indexing and searching of XML documents without 
destroying the underlying hierarchical structure of the documents. 

In accordance with yet another aspect of the invention, a data structure 
that stores a . . . 

Claim 

... an XML document and generates a relational database 
table based on the XML document; 

the converter further comprising a software module that generates a 
unique name attribute for each node in the XML document, a 
software module that generates a path attribute for a particular node of 
the XML document wherein the path attribute comprises a list of the 
name attributes for the one or more nodes from the particular node to 
a root node of the XML document, a software module that generates an 
order attribute for the particular node, the order attribute comprising 
an enumerated order of I I the particular node from the root node to the 
particular node, and a software module that generates a NodeValue 
attribute containing a value of the particular node. 

13 A method for manipulating an XML document using a relational 
database, comprising: 

generating a relational database table based on an. . . 
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English Abstract 

A system and method for automatic, knowledge-based processing and 
structuring of information from marked-up documents such as SGML, XML, 
HTML based documents. The system includes three parts: an offer 
processing (1000), an offer presentation (2000) and a database system 
(3000) that the offer processing (1000) and the offer presentation (2000) 
use. The offer processing (1000) builds anoffer database in the database 
system (3000) by accessing data sources such as Internet sites, intranet 
files, retrieving pages and processing them. The offer presentation 
(2000) allows users to access the offer database and retrieve information 
from it. The offer presentation (2000) also includes servers that may be 
accessed by different web-enabled means such as web browers, cellular 
phone, PDA, digital TV, Internet appliances, voice acrivated user 
interfaces . 

French Abstract 

L' invention concerne un systeme et un procede de traitement et de 
structuration automatiques et bases sur la connaissance d 1 informations de 
documents universels marques (tout type de document qui contient des 
informations structurelles, de presentation et semantiques (telles que 
des balises) associees a un contenu, p. ex. des documents SGML / XML / 
HTML). La presente invention peut s'appliquer a Internet ou a Intranet, 
Extranet et a d'autres sources informations en reseau. La technique 



fondamentale de la presen^^invention est un systeme et uri^Pfocede de 

traitement d f off res. Le systeme de traitement d' offres rassemble, stocke, 
traite, extrait et presente des off res d' informations, de produits ou de 
services. Le systeme rassemble automatiquement des off res d'un tres grand 
nombre de sites d' informations, et effectue cette operation sans recours 
a des reglages specifiques au site ou a des connaissances relatives a 
1 ' organisation specif ique des donnees d 1 off res dans des pages d'un site 
donne. Une caracteristique optionnelle de la presente invention est un 
systeme frontal destine a presenter des offres structurees a des 
utilisateurs ou a des tiers. Une application du procede et systeme de 
traitement d' offres peut par exemple etre un systeme et procede d' achats 
comparatifs. Cette application est un moteur d' achats comparatifs de tres 
grande echelle et automatique qui rassemble des offres de produits ou de 
services provenant d f un nombre indefini de vendeurs par Internet ou de 
fournisseurs de services, aussi bien sur le plan mondial (par le Web) que 
local (clic & mortier) . Les offres d'emploi, immobilieres, de logement, 
d' entreprises a entreprises (B2B) etc. sont autant d'exemples 
d' informations qui peuvent etre rassemblees. 

Legal Status (Type, Date, Text) 

Publication 20010419 A2 Without international search report and to be 

republished upon receipt of that report. 
Search Rpt 20011018 Late publication of international search report 
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Claim 

the web 

page's data structure, apart of the trivial assumption that the page is 
being 

presented in the generally used HTML format (later an XML format option 

will be added) . 

C Algorithm's output 

If the algorithm succeeds to fetch one or more product offerings in the 
relevant product category ... almost all web pages today) contains no 
definitive meaningful data structure. This was one of the main reasons 
for the development of structured formats like XML . When looking on 
the general context this might be true. This novel algorithm, however, 
finds structure in the structure less web pages sub-population of... 

. . .we should take some practical assumptions that are not a big 
constrain in the consumer e-commerce environment. 
Those assumptions are: 
A. 1 1 HTML/ XML 

Merchant content may be presented as an f ITM L (in the 
future also XML ) document (applicable to almost all the 
textual content on the WWW) . 
A. 1. 2. 11 Weak order of the content layou 

We can't take ... represents the certainty level of this assumption (it is 
like probability 

value, but we call it Likelihood-mark because it is not a real 

probability value ) . 

A 5 Attribute set 

A set S of attributes . 

A 6 Single value field 

A field F will be marked as "single value field" if in a legitimate 
product offer record, there will be just one value associated with that 
field. 

A 7 Combination value field 

A field F will be marked as "Combination value attribute " if in a 
legitimate product offer record, this field can get a list of different 
values, and the meaning of this list is that the...TA such that: 
* S ! is a legitimate product record in the product category PC. + PA is a 
price attribute (that means that PA is an attribute , PA. Field - name 
0 "Price") . 



File 8:Ei Compendex (R) 1970-^^03/Nov W2 

(c) 2003 Elsevier Eng. Info. Inc. 
File 35: Dissertation Abs Online 1861-2003/Oct 

(c) 2003 ProQuest Inf o&Learning 
File 202: Info. Sci . & Tech. Abs. 1966-2003/Nov 17 

(c) 2003 EBSCO Publishing 
File 65: Inside Conferences 1993-2003/Nov W3 

(c) 2003 BLDSC all rts. reserv. 
File 2:INSPEC 1969-2003/Nov W2 

(c) 2003 Institution of Electrical Engineers 
File 233: Internet & Personal Comp. Abs. 1981-2003/ Jul 

(c) 2003, EBSCO Pub. 
File 94: JICST-EPlus 1 985-2003/Nov W3 

(c)2003 Japan Science and Tech Corp(JST) 
File 603:Newspaper Abstracts 1984-1988 

(c)2001 ProQuest Inf o&Learning 
File 483:Newspaper Abs Daily 1986-2003/Nov 17 

(c) 2003 ProQuest Inf o&Learning 
File 6:NTIS 1964-2003/Nov W3 

(c) 2003 NTIS,' Intl Cpyrght All Rights Res 
File 144: Pascal 1973-2003/Nov W2 

(c) 2003 INIST/CNRS 
File 434:SciSearch(R) Cited Ref Sci 1974-1989/Dec 

(c) 1998 Inst for Sci Info 
File 34:SciSearch(R) Cited Ref Sci 1990-2003/Nov W2 

(c) 2003 Inst for Sci Info 
File 99:Wilson Appl . Sci & Tech Abs 1983-2003/Oct 

(c) 2003 The HW Wilson Co. 
File 583:Gale Group Globalbase (TM) 1986-2002/Dec 13 

(c) 2002 The Gale Group 
File 266:FEDRIP 2003/Sep 

Comp & dist by NTIS, Intl Copyright All Rights Res 
File 95:TEME-Technology & Management 1989-2003/Nov Wl 

(c) 2003 FIZ TECHNIK 
File 438:Library Lit. & Info. Science 1984-2003/Oct 

(c) 2003 The HW Wilson Co 



Set Items Description 

51 17297 XML OR (EXTENSIBLE OR XTENSIBLE) () (MARKUP OR MARK () UP) 

52 554 NAME? ?( 3N) ATTRIBUTE? ? 
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XML document can be displayed in different ways by using different 
style sheets, or the same style sheet could govern the display of similarly 
structured XML documents. 

The nonproprietary nature of XML, combined with its ease of writing, 
makes XML an ideal format for data exchange among applications. 

A Simple XML. . . 

...some time making her document easy to read with judicious use of tabs, 
white space and blank lines. 

XML processing instructions and tags often use XML attributes , 
which are name -value pairs separated by an equals sign; the values must 
be enclosed in quotes (Another difference between XML and HTML is that 
most HTML values do not require the quotes.) The XML declaration requires 
the use of the version and standalone attributes. For now, know that this 
XML declaration states that the document conforms to XML version 1.0, 
and does not require any other documents for parsing its content. 

This document does not contain any display or formatting information. 
Therefore, one would need a style sheet, and a way of telling the XML 
document to use that style sheet to display the XML document. Let 1 s use 
the simple style sheet below, saved as forfirst.css in the same directory 
as the XML document. Notice that the style sheet refers to the tag 
<First> used in the document. 

First {display: block; font-size: 36pt; 
font-weight: bold; color="OOFFOO" ; } 
The second, italicized processing instruction below associates the 
stylesheet with the XML document: 
<? xml version="l. 0" 
st andalone="yes " ?> 

<? xml -stylesheet type="text/css " 
href="forfirst .css"?> 
<First> 

My First XML Document 
</First> 
XML Document Components 

XML documents are text consisting of data and markups. The data is 
what the author encodes; the markups tell the XML parser how this data is 
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... form elements. To make sure your code works with today's browsers 

and with existing scripts, you can use both a name and an ID attribute . 
- Attribute values must be quoted, and no minimization is 



allowed. A common practice amoi^^HTML coders is to leave the qu^BIs out 

when specifying values for elements... 



...in XHTML. Some attributes, such as "checked," could be minimized when 
using several browsers. This also is not valid. You can't have a dangling 
attribute : 

< input id="acheckbox" name ="acheckbox" checked /> is incorrect. 

< input id="acheckbox" name="acheckbox" checked^ "checked" /> is 
correct . 

- XHTML documents have some mandatory elements. You no longer can 
have documents a minimal XTHML document < /title> 

</head > 
<body > 
</body > 
</html > 
Rules of XML 

Another key difference is that XHTML documents must conform to XML 
rules. Here are the pertinent XML rules for XHMTL developers (for more 
information, see www. xml .com): 

- All XML documents are well-formed by definition. A well-formed 
document adheres to the XML structure but does not follow a certain DTD. 
A document following a certain DTD is called valid. In XHTML, the DTD is 
that of HTML. . . 
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... the three required DTDs. 

Minimal XHTML 

The following code, taken from the XHMTL proposed recommendation, is 
an example of a minimal XHTML 1.0 document: 

<? xml version="l. 0" encoding="UTF-8 " ?> 
<!D0CTYPE html 

PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"DTD/xhtmll-strict . dtd"> 

<html xmlns="http: //www. w3.org/1999/xhtml" xml : lang="en" lang="en"> 
<head> 

<title>Virtual Library</title> 

</head> 

<body> 

Moved to <a href ="http : //vlib . org/">vlib . org </a>. 

</body> 

</html> 

Some specifics... 

...Empty elements must either have an end tag, or the start tag must end 
with />. This is sometimes called a self-terminating element. 

Element and attribute names . XML is case-sensitive, and the 
XHTML DTDs, element, and attribute names must be in lower case. All 
attribute values must be quoted in single or double quotes: 

Nested elements. Elements must also be properly nested, so that 
closing tags must be in reverse order of the opening tags. 

No minimized attributes. XML , and therefore XHTML, does not support 
attribute minimization. 

Script and style tags. Because any < and & characters are considered 
parts of tags in XHTML, any script... 
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... 3>Network and Systems Management with XML </article... 

. . . report> 

Once elements have been defined, a DTD may also define attributes 
using the ! ATTLIST command. This specifies an element, names the 
attribute to be associated with it, and then exerts con- trol over the 
values that attribute can have. For example, the following would 
associate the attribute manufacturer with the element car, allowing the 
former to have one of the three values ... include ! ENTITY declarations, 
which define entity references, and ! NOTATION declarations, which let a 
parser know what to do with binary files that are not in XML format. 

A serious and rather surprising limitation of DTDs, however, is that 
they do not permit datatyping-that is, constraining data to a particular 
format . . . 
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Softquad Software Inc. is one of the latest vendors to build a new 
tool simplifying development in XML , which is attracting growing 
corporate interest as a standard for distributing data across Web-based 
environments . 

XMetal, scheduled to ship in March, is one of the few authoring tools 
that provide features to speed up the process of writing in Extensible 
Markup Language . 

"There's very few XML /SGML (Standard Generalized ML) editors on the 
market right now, " said Betty Harvey, an analyst at Electronic Commerce 
Connection Inc., of Germantown, Md., a consulting... 

. . .which resembles a standard word processor, requires users to first 
choose a DTD (Document Type Definition) and configure the tool bars to 
insert related tag names , attributes , attribute values and other 
elements on demand. 

"XMetal will only let you use those XML elements that are valid for 
the DTD you specify, " Harvey said. 

DTDs define the format codes, or tags, embedded within XML 
documents. Each industry — health care, insurance and so on — uses different 
DTDs. 

XMetal has an extensible object management system, called Resource 
Manager, for storing boilerplate... 

...Microsoft Corp.'s COM (Component Object Model), developers can use any 
tool that supports COM to build wizards and add them to the XMetal editor. 



XML is a subset of SGML, a te^^based language for describing Hi content 
and structure of digital documents. 

The wizards can be used to build customized functions within the XML 
documents. The new product allows developers to save SGML documents as 
XML , providing an easy migration path. 

XMetal supports HTML 4.0 elements, such as forms, tables and links, 
as well as other standards approved by the... 
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... can be invoked to update the element. In this preliminary 

definition, there are three basic methods: addAttribute, getAttribute, and 
removeAttribute . The addAttribute method sets a value for a named 
attribute ; getAttribute method returns a reference to a named attribute 

so that you can examine its values; and removeAttribute does just that, 
which effectively sets some attributes back to their default value. 
Language-specific Bindings. . . 

...are the language-specific bindings for the document object model, which 
map language-specific calls and data to language-neutral calls and data. 
The core, XML , and HTML object models have different sets of bindings. 
Because those bindings are language-specific, they must be implemented in 
each programming language separately. That... 
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transaction as incomplete and inform the reader that the current 
transaction is pending. 

The final attribute for the purchase element is an enumerated type. 
The attribute has four name tokens that define the valid values for 
the attribute . From these tokens, you can determine that purchase 
transactions are defined as walk-in orders, Web-related orders, phone 
orders or mail orders. The default value for the attribute is "walkin." 
Thus if no value is specified, the purchase transaction is assumed to be a 
walk-in order. 

Completing the DTD 

After defining the... 
...add some additional information to the document, such as your company 
name or copyright information. Although you could insert this information 
directly into the document, XML does provide a mechanism that makes it 
easier to maintain documents over time. This mechanism is called an entity. 

In its basic form, an entity... 




...entities allow text and files to be substituted into a document, they 
can be used to replace values when a document is displayed. 

Unlike HTML, XML allows you to define your own entities. Let's say 
your company name is CPC Enterprises Ltd., and you want to add this 
information to. . . 
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Due to ship in Ql of 1999, Veo Systems' XML parser will bridge the 
gap between XML and Java by being the first parser to support the Schema 
for Object Oriented XML (SOX) submission to the W3C XML -Schema working 
group. The Veo parser will combine unprecedented speed with XML DTD and 
SOX validation. 

SOX extends XML for e-commerce and distributed computing by adding 
strong typing, inheritance, global name spaces, and legal attribute 
values . These extensions enable validity checking and facilitate mapping a 
document to other documents as well as to the rich data schemas used in 
databases and object oriented software applications. 

"Veo is leading the way to couple the power of XML and Java. Veo 
Systems 1 parser will bridge the gap between XML and Java and by giving 
developers a familiar typed programming model for manipulating XML 
documents and make XML more powerful for the Java developer," stated Dr. 
Jay Tenenbaum, chairman and chief scientist of Veo Systems. "We are looking 
forward to working with our customers who deploy Sun's JDK in building 
innovative e-commerce and business integration solutions." 

Veo Systems is currently accepting applicants for the XML parser 
early access program. Anyone interested in participating should visit Veo's 
website at www.veosystems.com/ xml /parser/parser . html 

About Veo Systems, Inc. 

Veo Systems is the leading supplier of products and services to 
enable open commerce networks. Using Veo's solutions, companies can 
significantly lower the economics of business-to-business integration by 
exchanging information using self-describing XML business documents that 
both people and computers can readily understand. 

Veo's world-class technology team pioneered the application of XML 
to electronic commerce. Veo Systems is located in Mountain View, Calif, and 
can be reached at 650/988-7244 or via the Internet at http. . . 
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... be a boon. A simplified version of SGML for the Web would be one 

logical way to address the limitations of HTML. 



The Rules Of XML 




This is where XML comes in. Its design goal was to be a subset of 
SGML useful for the Web. Writing XML sounds like a daunting task, but 
this isn't necessarily true. Unlike HTML, miscoded documents in XML won't 
render at all. For those Webmasters considering how many of their HTML 
documents would render if that language had such strict rule enforcement, 
you can stop sweating. The rules of basic XML are easy. Suppose you need 
to define some tags to represent an invoice in XML — merely create a 
document as shown in the green chart below: 

Naming a tag <ENTRY> instead of <ITEM> is completely up to the 
document author. Choose any element and attribute names that represent 
the domain being modeled. Does XML have any more rules? Yes, but they are 
few, and only relate to syntax. 

The first rule is that, just like well-written HTML, all ...tags must 
be properly nested and must match. There also must be an enclosing element 
for the whole document. 

The second rule is that all attribute values must be quoted. In 
HTML this is good authoring practice but is only required for values that 
contain spaces. 

In XML , BLASTOFFCOUNT="10"> is correct. <BLASTOFF COUNT=10> is not 
correct . 

The final rule is that all elements with empty content must be 
self-identifying by. . . 

...instead of the familiar ">" . An empty element is one that does not 
require a closing tag, like the HTML elements <BR>, <HR> and <IMG>. In XML 

these would be <BR/>, <HR/> and <IMG SRC="test . gif "/> . Why is this last 
rule needed? Because XML documents may not have a formal DTD associated 
with them. Lacking a DTD, there is no way for a parser to know if a tag 
like <BR> is empty or requires a matching </BR> tag later in the document. 
For parsing efficiency, XML needs a syntactic signal to identify empty 
tags . 

An XML document made according to these rules is a "well-formed" 
document. SGML purists may find this notion eccentric and somewhat 
troubling. However, just because a... 
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InfoWorld: Is one of the potential uses of XML to enable more 
sophisticated searching than is available today? 

Maloney: This is one of the most obvious and compelling benefits XML 
could provide. 

InfoWorld: To enable searches on specific categories, say on "Welsh 
corgis," won't vertical industry segments have to agree on tags for XML 
-based markup languages such as the Dog Breeder's Markup Language (DBML) ? 

Maloney: That is darned close to being completely accurate. The SGML 
[Standard Generalized Markup Language]/ XML pedants would tell you that 
you don't have to agree on a set of tags [element types], but could instead 
agree on a set of attribute names and values . In fact, the use of 
attributes to provide deeper meaning is a characteristic of "architectural 
forms." The good news is that there are quite a... 

...already deployed. These tools are equipped to handle any number of 
markup languages . 

In general, Web tools will have to be equipped to read an XML 
document, present it according to the rules specified by a style sheet 



(which may be Cascading Style ^Plets, Extensible Style Sheets, V^ome 
other style . . . 
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InfoWorld: Is one of the potential uses of XML to enable more 
sophisticated searching than is available today? 

Maloney: This is one of the most obvious and compelling benefits XML 
could provide. 

InfoWorld: To enable searches on specific categories, say on "Welsh 
corgis," won't vertical industry segments have to agree on tags for XML 
-based markup languages such as the Dog Breeder's Markup Language (DBML) ? 

Maloney: That is darned close to being completely accurate. The SGML 
(Standard Generalized Markup Language)/ XML pedants would tell you that 
you don f t have to agree on a set of tags (element types), but could instead 
agree on a set of attribute names and values . In fact, the use of 
attributes to provide deeper meaning is a characteristic of "architectural 
forms." The good news is that there are quite a... 

...already deployed. These tools are equipped to handle any number of 
markup languages. 

In general, Web tools will have to be equipped to read an XML 
document, present it according to the rules specified by a style sheet 

(which may be Cascading Style Sheets, Extensible Style Sheets, or some 
other style . . . 
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. . .TEXT: phase, the next-to-last step before XHTML becomes an official 
Recommendation. The next-- generation Web standard-for all purposes a 
merger of HTML and XML -should be approved by the time you read this. 

XHTML is the most significant change to the language of the Web since HTML 
4.0. . . 

... approved in 1997. XHTML is a family of document types and modules that 
extend the functionality of HTML 4.0 and has its roots in XML . Documents 
based on XHTML are designed to work with the new wave of Webenabled 
devices, including cell phones and PDAs. 

In brief, XHTML documents conform to XML and can be viewed, edited, and 
validated with standard XML tools. However, XHTML documents can also be 
viewed by existing HTML 4 . 0-compliant browsers and other user agents. XHTML 
documents can also run processes (scripts and applets) that are based on 



the HTML Document Object Model^P>OM) or the XML DOM. 



XHTML contains several important syntax changes from HTML. Since XML is 
case — sensitive, XHTML documents must use lower case for all HTML element 
and attribute names . Also, XHTML is strict in its interpretation of 

tags; this means that all elements must either have closing tags or be 
written in a special. . . 

. . . creates a paragraph break, is often used without its closing partner 
</p>. This is unacceptable in XHTML. Other changes include the need for 
quotes in attribute values . For example, the attribute tag <table 

rows="3"> is correct; <table rows=3> is incorrect. 

In creating XHTML documents, authors should note that pages can be labeled 
as text/html, text/ xml , or application/ xml . When labeled as text/html, 
however, documents that don't follow standard HTML Compatibility Guidelines 
will almost certainly fail to be processed, according to XHTML... 

. . . will be carried out on alternate plat f orms-that is, non PC-based 
applications. New protocols like WAP, which target small-footprint 
microbrowsers, are already implementing XML . However, Web content must be 
written specifically in WML, WAP ? s markup language, in order to be 
processed quickly and efficiently. XHTML will give page... 
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... form elements. To make sure your code works with today's browsers 

and with existing scripts, you can use both a name and an ID attribute . 

Attribute values must be quoted, and no minimization is 

allowed. A common practice among HTML coders is to leave the quotes out 
when specifying values for elements... 

...in XHTML. Some attributes, such as "checked," could be minimized when 

using several browsers. This also is not valid. You can't have a dangling 
attribute : 

< input id="acheckbox" name ="acheckbox" checked /> is incorrect. 

< input id="acheckbox" name="acheckbox" checked= "checked" /> is 
correct . 

- XHTML documents have some mandatory elements. You no longer can 
have documentsbody > 

</body 

</html 

Rules of XML 

Another key difference is that XHTML documents must conform to XML 
rules. Here are the pertinent XML rules for XHMTL developers (for more 
information, see www. xml .com): 



* • 

- All XML documents are well-formed by definition. A well-formed 
document adheres to the XML structure but does not follow a certain DTD. 
A document following a certain DTD is called valid. In XHTML, the DTD is 
that of HTML. . . 



16/3, K/14 (Item 2 from file: 647) 

DIALOG (R) File 647: CMP Computer Fulltext 
(c) 2003 CMP Media, LLC. All rts. reserv. 

01215262 CMP ACCESSION NUMBER: IWK20000508S0072 

XHTML: A Bridge To The Future - THE W3CS RECOMMENDATION BLENDS XML AND 
HTML TO PRODUCE EXTENSIBLE WEB-PAGE FORMATTING 

DON KIELY 

INFORMATIONWEEK, 2000, n 785, PG210 
PUBLICATION DATE: 000508 

JOURNAL CODE: IWK LANGUAGE : English 

RECORD TYPE: Fulltext 

SECTION HEADING: Application Development 
WORD COUNT: 1903 

... the three required DTDs. 

Minimal XHTML 

The following code, taken from the XHMTL proposed recommendation, 
is an example of a minimal XHTML 1.0 document: 

<? xml version="1.0" encoding="UTF-8"?> 



PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 

"DTD/xhtmll-strict .dtd"> 

<html xmlns="http: //www. w3.org/1999/xhtml" xml :lang="en" 
lang-"en"> 

<head> 

<title>Virtual Library</title 
</head 
<body> 

Moved to <a href ="http : //vlib. org/">vlib . org </a. 
</body 
</html 

Some specifics... 

...Empty elements must either have an end tag, or the start tag must end 
with />. This is sometimes called a self- terminating element. 

Element and attribute names . XML is case-sensitive, and the 
XHTML DTDs, element, and attribute names must be in lower case. All 
attribute values must be quoted in single or double quotes: 

Nested elements. Elements must also be properly nested, so that 
closing tags must be in reverse order of the opening tags. 

No minimized attributes. XML , and therefore XHTML, does not 
support attribute minimization. 



Script and style tags. Becaus^Jmy < and & characters are 
considered parts of tags in XHTML, any script... 
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... be a boon. A simplified version of SGML for the Web would be one 

logical way to address the limitations of HTML. 
The Rules Of XML 

This is where XML comes in. Its design goal was to be a subset of 
SGML useful for the Web. Writing XML sounds like a daunting task, but 
this isn't necessarily true. Unlike HTML, miscoded documents in XML 
won't render at all. For those Webmasters considering how many of their 
HTML documents would render if that language had such strict rule 
enforcement, you can stop sweating. The rules of basic XML are easy. 
Suppose you need to define some tags to represent an invoice in XML 
-merely create a document as shown in the green chart below: 

Naming a tag <ENTRY>instead of <ITEM>is completely up to the document 
author. Choose any element and attribute names that represent the 
domain being modeled. Does XML have any more rules? Yes, but they are 
few, and only relate to syntax. 

The first rule is that, just like well-written HTML, all ...tags 
must be properly nested and must match. There also must be an enclosing 
element for the whole document . 

The second rule is that all attribute values must be quoted. In 
HTML this is good authoring practice but is only required for values that 
contain spaces. 

In XML , BLASTOFFCOUNT= 

>"10" is correct. <BLAST0FF COUNT&EQUALS ; 10>is not correct. 
»The final rule is that all elements with empty content must be 
self -identifying. . . 

...instead of the familiar "". An empty element is one that does not 
require a closing tag, like the HTML elements <BR>, <HR>and <IMG>. In XML 

these would be <BR/>, <HR/>and <IMG SRC&EQUALS; "test . gif "/> . Why is this 
last rule needed? Because XML documents may not have a formal DTD 
associated with them. Lacking a DTD, there is no way for a parser to know 
if a tag like <BR>is empty or requires a matching</BRtag later in the 
document. For parsing efficiency, XML needs a syntactic signal to 
identify empty tags. 

An XML document made accordin 



